APPROXIMATION ALGORITHMS FOR THE 
NORMALIZING CONSTANT OF GIBBS DISTRIBUTIONS 



By Mark Huber 

Consider a family of distributions {tt/s} where X ~ tt@ means 
that ¥(X = x) = exp(-/3H(x))/Z(l3). Here Z(fi) is the proper nor- 
malizing constant, equal to ^2 exp(— f3H(x)). Then {77/3} is known 
as a Gibbs distribution, and Z(f3) is the partition function. This work 
presents a new method for approximating the partition function to 
a specified level of relative accuracy using only a number of samples 
that is 0(ln(Z(/3)) ln(ln(Z(/8)))) when Z(0) > 1. This is a sharp im- 
provement over previous similar approaches, which used a much more 
complicated algorithm requiring 0(ln(Z(/3)) ln(ln(Z(/?))) 5 ) samples. 



1. Introduction. The central idea of Monte Carlo methods is that the 
ability to sample from certain distributions gives a means for estimating 
the value of an integral or sum. This paper presents a new method for 
using samples to approximate a broad class of sums coming from Gibbs 
distributions that is faster than previously known methods. 

Definition 1.1. {n^pem is a Gibbs distribution with parameter f3 over 
finite state space f2 if there exists a Hamiltonian function H(x) : f2 — > R 
such that for X ~ up, 

F(X = x) = exp(-j3H(x))/Z(P), 

where Z((3) = Ylxen ex P(~fiH(x)) is called the partition funtion of the 
distribution. 

The partition function can be difficult to compute, even when dealing 
with simple problems. 

Example 1.1 (The Ising model). Given a graph G = (V,E), let ft = 
{—1, 1} V , and H[x) = — Yl{ij}eE = X C?))> where l(-) is the indicator 

function that is 1 if the argument is true and if it is false. Then the Gibbs 
distribution with this Hamiltonian is called the Ising model. Finding Z(f3) 
for arbitrary graphs is a #P-complete problem [8]. 
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A vast literature has arisen devoted to finding ways to generate random 
variables from Gibbs distributions (see for instance [9, 12] or [2] for an 
overview.) For the Ising model, Jerrum and Sinclair [8] gave an algorithm 
for approximately sampling from 7r« in polynomial time for any value of /?. 
Propp and Wilson [10] gave an algorithm for the Ising model that seems to 
run efficiently when j3 is at or below a cuttoff known as the critical value. 

Once an effective method for obtaining approximate or perfect samples 
from the target Gibbs distribution exists, the question becomes, what is the 
best way of using those samples to approximate Z(/3)? 

Definition 1.2. Say that A is an (e, 3/ '4) -randomized approximation 
algorithm for Z{f3) if it outputs value Z(f3) such that 

1 < -T^fzr < 1 + e > 3/4. 



\l + e- Z{p) 

Here e > controls the relative error between the approximation and the 
true answer. The 3/4 on the right hand side can be made arbitrarily close to 
1 by repeating the algorithm and taking the median of the resulting output. 

1.1. Previous work. The first step in building such an approximation 
algorithm is importance sampling. For most Gibbs distributions calculating 
Z(0) is straightforward, and it is easy to generate samples from ttq. For the 
Ising model, ttq is just the uniform distribution over {-l,l} y , and Z(0) = 
2# v . With a draw X ~ -kq in hand, let 

(1.1) W = exp(-pH(X)). 
Then 

F[wl = E, e oexp(-/3ff(x))exp(0) £(£) 

L J Z(0) Z(0)' 

making W ■ Z(0) an unbiased estimator of Z(j3). 

The relative performance of this Monte Carlo estimate is controlled by 
the relative variance, the square of the coefficient of variation. For a random 
variable X with finite second moment, Y rc \(X) = [E(X 2 )/E(X) 2 ] - 1. Hence 
for the random variable W as in (1.1): 

no) V (W)- 1 ■ £^ne*p(-/3ff(*)) 2 Z(0) 2 _ 1 | zmz(0) 

(1.2) yuw)--i + ^ zw~ ' 



There are two main issues with this relative variance: 
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1. For problems like the Ising model, this last ratio can be exponentially 
large in the input, making the method untenable. 

2. The relative variance involves the value of Z(2/3), outside the interval 
of interest [0, j3]. Typically, larger values of f3 make sampling from irp 
more difficult. This presents a serious impediment to the method. 

The first problem can be dealt with by using the multistage sampling method 
of Valleau and Card [14]. In this approach, a sequence of f3 values = /?o < 
Pi < /?2 <•••</?£ = (3 are introduced, called a cooling schedule. Then 



Each of the individual factors in the product on the right can then be es- 
timated separately and then multiplied to give a final estimate. Fishman 
called an estimate of this form a product estimator [5, p. 437]. 

It is simple to calculate the mean and relative variance of a product 
estimator in terms of the mean and relative variance of the individual factors. 

Theorem 1.1. For P = \\Pi where the Pi are independent, 



Let q = ln(Z ((3) / Z (0)) , and suppose H(x) 6 {0, . . . , n}. Then Bezakova et 
al. [1] introduced a fixed cooling schedule with two pieces, the first where the 
parameter value grows linearly, and the second where it grows exponentially: 



where k = \q~\ and 7 = 1 + 1/q. With this fixed cooling schedule, they give 
an (e, 3/4)-approximation algorithm that uses 0(g 2 (lnn) 2 ) samples in the 
worse case. 

By using an adaptive cooling schedule, it is possible to do better. In [7], 
the author and Schott introduced a general technique for finding normal- 
izing constants of sums and integrals called TPA. When applied to the 
specific problems of Gibbs distributions, the running time for an (e,3/4)- 
approximation algorithm becomes 0(q 2 ). 

To break the q 2 barrier for Gibbs distributions, Stefankovic, Vempala 
and Vigoda [13] created a multistage algorithm that adaptively created the 
cooling schedule. The resulting algorithm was highly complex, and they were 
interested primarily in the theoretical properties rather than a practical 
implementation. Their (e, 3/4)-approximation algorithm used at most 



Z((3) _ gQ9i) Z((3 2 ) Z((3 e ) 



Z(0) Z(p ) Z{fr) ZiPt-x)' 



E[P] = Y[E[Pi\, Y rel (P) = -1 + JJ(1 + V^CPi)). 




(1.3) 



10Vln(n) +ln(g)) 5 e~ 2 ) 
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samples on average from the target distribution. 

1.2. Main result. The multistage idea solves the issue of Z(2(3)Z(0)/Z((3) 2 
being too large, but fails to solve the issue of the variance depending on 
Z{2(3). Dealing with this leads to several of the In factors in [13]. In this 
work a new method is introduced, the paired product estimator which has a 
variance only involving quantities within [0, /3]. The result is an algorithm 
where the overal variance can be analyzed precisely. This allows for the con- 
struction of an approximation algorithm much simpler than that found in 
[13], and which requires far fewer samples. 

Theorem 1.2. Suppose n > 4 and e < 1/10. When H(x) £ {0,1, ... ,n} 
or {0, —1, —2, . . . , — n}, the new method is an (e, 3/4) -approximation algo- 
rithm that uses only 

(1.4) (q + 1)[5 + (2 + ln(2ra))(14.91n(100(2 + In(2n))(g + 1)) + 48.2e~ 2 )]. 

draws from the Gibbs distribution on average. In the more general case that 
\H(X)\ < n, the new method is an (e, 3 /4) -approximation algorithm that 
uses at most 

(q + 2n/3 + 1)[5 + 10.71n(69.4(g + 2n/3 + 1)) + 16.7e~ 2 ]. 
draws from the Gibbs distribution on average. 

It is possible to derive an upper bound on the number of samples used 
when n < 4 or e > 1/10, these assumptions just make the presentation 
cleaner. The first statement of the theorem is proved in section 3, and the 
second statement is proved in section 4.2. 

Section 2 describes the overall structure of the algorithm and shows how to 
obtain a good cooling schedule. Section 3 then analyzes the relative variance 
of the pieces of the algorithm. The next section then describes how to build 
a randomized approximation algorithm that uses the number of samples 
given in (1.4). Section 4 then considers some variations and extensions of 
the algorithm and how the number of samples needed changes. 

2. The Algorithm. Let q = \n(Z (0) / Z (/3)) . Then to obtain an approx- 
imation within a factor of 1 + e of Z(0)/Z(f3), it is necessary to obtain an 
approximation of q within an additive factor of ln(l-(-e). The main algorithm 
consists of the following pieces. 

1. Obtain an initial estimate of q. 

2. Obtain a well balanced cooling schedule. 
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3. Use the well balanced schedule with the paired product estimator. 

The first two pieces will be accomplished using TPA, introduced in [7]. To 
use TPA for Gibbs distributions on parameter values [0, 0\, it is necessary 
that H{x) be either always nonnegative or always positive. In section 4.2 it 
is shown how to deal with the situation where the sign of H{x) changes over 
the state space, but for now assume that the sign is unchanging. 

In the Ising model example shown earlier, H{x) < 0, and so Z(j3) is an 
increasing function of /3. In this case, TPA is an algorithm that generates a 
random set of parameter values in the interval from to /3 by taking samples 
from TTb for various values of b E [0, 0\. The output of TPA is a Poisson point 
process (PPP) of rate 1 in [z(0), z((3)]. (See section 2 of [7].) 

Algorithm 2.1. TPA for Gibbs distributions with H(x) < takes as 
input a value f3 > together with an oracle for generating random samples 
from 7T5 for b £ [0, (3], and returns a set of values < b\ < 62 < • • • < W < b 
such that {z(bi), . . . ,z(be)} forms a Poisson point process of rate 1 on the 
interval [z(0), z(/3)]. It operates as follows. 

1. Start with b equal to /3 and B equal to the empty set. 

2. Draw a random sample X from 71&, and draw U uniformly from [0, 1]. 

3. Let b = b — ln(U) / H (X) , unless H(X) = 0, in which case set b = —00. 

4. If b > 0, then add b to the set B, and go back to step 2. 

The number of samples drawn by TPA will equal 1 plus a Poisson random 
variable with mean q [7, pp. 3-4]. The output of Algorithm 2.1 can be used 
in several different ways. When TPA is run k times and the output sets 
combined, the result is a Poisson point process on [z(0), z(/3)] of rate k. 

It is even possible to obtain rates that are fractional. To obtain rate k 
where k is not an integer, first run TPA [~Af| times. Then for each point of the 
process, keep it independently with probability k/\k~\. Otherwise discard it 
entirely. This procedure, known as thinning, enables creation of a PPP of 
any positive rate, which will simplify the analysis later. (See [11, p. 320] for 
more on thinning.) 

After a PPP of rate k has been generated, the number of points in the 
process has a Poisson distribution with mean k{z{(3) — z(0)). This gives a 
way of initially getting an estimate of z(ft) — z(0) that (by choosing k high 
enough) has a 99% chance of being within a factor of 2 of the correct value. 

Once that is accomplished, TPA is run, this time with an even larger value 
of k based on the estimate from the first step. Because the z(b) values form 
a Poisson point process, the difference between successive z(b) values will 
be an exponential random variable, so if b' is the dth point following b, then 
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z(b') — z(b) will have a gamma (Erlang) distribution with shape parameter d 
and rate parameter k. By making k and d large enough, this will be tightly- 
concentrated around its mean value of d/k for all such differences. 

The result is a set of parameter values {/%} that are well balanced in the 
sense that for these z(/3j+i) — z(/3i) are all close to the same small value. 

Call interval i. Now each z(/3i + \) — z(/3i) will be estimated in- 

dependently using the paired product estimator. This works as follows. For 
each interval i, let m,i = (/3j + /3j+i)/2 be the midpoint of the interval, and 
5i = m,i — fii = Pi+i — m,i be the half length of an interval. Draw X ~ ir^, 
and Y ~ wp i+1 - Then set 

Wi = exp(-5iH(X)), Vt = exp(6iH(Y)) 

Then 



nwi] 



Z{Pi) Z{Pi) Z(Pi 



Similarly, E[V£] = Z (rrii) / Z (f3i + \) . Therefore, Wi can be used to estimate 
the drop z(rrii) — z(/3i), and Vi can estimate the drop z(/3i+i) — z{rrii). 
Now for the relative variance calculation. 



2 



V (w\ K[W ? ] i 1 i E^M-^H(x))exp(-5 t H(x)) 2 Z(ft) 2 

VrelW = W" + m) Wh) 

= -1 + gfell^ since A + 2^ = 

A similar calculation shows that V re i(V^) = V re i(Wj), and now the variance 
of our estimators for interval i only involves Z(b) values for b that fall in 
interval i. 

Let W be the product of the Wi over all intervals i, and V be the product 
of the Vi. Then the final estimate of Z(f3)/Z(0) is W/V. This is not quite 
an unbiased estimator, but it is true that E[W]/E[F] = Z(f3)/Z(0). If both 
W and V are tightly concentrated around their means, then W/V will be 
close to Z(f3)/Z(0). To get that tight concentration, in the next section it 
is shown that the relative variance of W (and V) is small as long at the f3 
values form a well balanced schedule. 

With that small relative variance it is possible to repeatedly draw inde- 
pendent, indentical copies of W to get a sample average W which is tightly 
concentrated about its mean. (The same is true for V as well.) The following 
algorithm incorporates these ideas. 
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Algorithm 2.2 (Paired product approximation algorithm). The input 
is a value /3 > together with an oracle for generating samples from 7Tb for 
b G [0,/3]. The output is an approximation for Z(f3)/Z(0). 

1. Run TPA 5 times to get an estimate of q = ln(Z (fi) / Z (0)) that is at 
least q/2 with probability 99%. 

2. Run TPA k times to obtain a set of parameter values. Sort these values 
and then keep every d successive value. Add parameter values and 
P and label the result = /3 < Pi < ■ ■ ■ < fit = fi- 

3. Repeat the following [~2e\/l0((l + e) 1 / 2 — 1)~ 2 ] times: for each i, draw 
Xi ~ 7r ft , let W % = expi-SiHiXi)) and V- = exp(<5;# (X i+1 )) , W = 
Y\ Wi and V = Y\ Vi ■ Take the sample average of the W values to get 
W, and the sample average of the V values to get V. 

4. The estimate of Z(/3)/Z(0) is W/V. 

Note that ((1+e) 1 / 2 — 1)~ 2 ~ 4e~ 2 . It is necessary to use this more complex 
expression because the final estimator is the ratio of W and V (see the proof 
of Theorem 3.2.) Algorithm 2.2 can be run for any values of d and k, the 
next section shows how to choose them properly to make Algorithm 2.2 an 
(e, 3/4)-approximation algorithm. 

3. Analysis. In this section the following theorem is shown. 

Theorem 3.1. In Algorithm 2.2, let q\ be the size of the Poisson point 
process created with 5 runs of TPA in step 1. Let 

d= [221n(100(2 + ln(2n))(gi + 1/2))], and k = (2/3)d[2 + ln(2n)]. 

Then the algorithm output is within 1 + e of Z(f3)/Z(0) with probability at 
least 3/4. 

Let q = ln(Z(/3)/Z(0)). The proof breaks into three parts. The first shows 
that by running TPA 5 times, the probability that q\ + 1/2 < {l/2)q is at 
most 1%. The second part shows that with the choice of k, the probability 
that the schedule is not well balanced is at most 4%. Finally, the third part 
shows that the third step of the algorithm produces W and V that are both 
within 1 + e/2 of their respective means with probability at most 20%. The 
union bound on the probability of failure is then 1% + 4% + 20% = 25%, as 
desired. 

3.1. The initial estimate q\. Recall that Algorithm 2.1 has output that is 
a Poisson point process with rate 1. Let k\ denote the number of times that 
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TPA is run and the output combined. Then the new PPP has a rate of k\. 
Therefore the number of points in the PPP is Poisson distributed with mean 
k\{z{(5) — ^(0)). The following lemma concerning Poisson random variables 
then shows that q\ + l/2 is at least 1/2 of its mean with probability at least 
Vo. 



Lemma 3.1. Let X have Poisson distribution with mean fj,. Then ¥(X < 
fi/2) < 2(^)- 1 /2( 2 / e )^/2. 

Proof. Suppose n/2 = \n/2]. Then 

P(X < M /2) = exp(- M ) V ^ < exp(- M )2/i— -. 

i<»/2 *• [fx/Z) - 

The last inequality comes from the fact that each term in the sum is at least 
twice the previous term. The Sterling bound i\ > \/2iri(i/e) 1 gives P(X < 
H/2) < 2{TT^)- l / 2 {2/eY' 2 . Now suppose fi/2 / \n/2\. Let // = 2^/2]. 

P(X < n/2) < P(X < fjf/2) < 2{Trn')- 1/2 {2/eY' /2 < 2(^)- 1 / 2 (2/e)^ t . 

□ 



Suppose step 1 runs k\ repetitions of TPA. Then q\ has a Poisson distri- 
bution with mean k\q. If q < 1 then it is always true that gi + 1/2 > (l/2)q. 
If q > 1 then setting &i = 5 and using Lemma 3.1 makes the probability of 
failure below 1%. 

3.2. The well balanced schedule. Now consider the second step in Algo- 
rithm 2.2. First, run TPA A; times to get a set B that is a PPP of rate k on 
the interval [z(0), z(/3)]. Since B is a PPP of rate k, if b < b' are values in 
B such that there are exactly d — 1 values in (6, 6'), then z(b') — z(b) has a 
gamma distribution with parameters d and fc. That is equivalent to saying 
z(b') — z(b) has the distribution of the sum of d independent exponential 
random variables each with rate k. Hence the moment generating function 
of z(b') — z(b) is [k/(k — t)] d . Let t and n be nonnegative real numbers, then 

P(z(b') - z(b) >rj) = P(exp(i(z(6') - z(b)) > exp(r]t)) 

= [k/(k — t)] d exp(— nt) by Markov's inequality 
= {rjk/d) d exp(— rjk + d) by setting t = k — d/n. 
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On the other hand, for t > 0, multiplying by —t and exponentiating gives 
F(z(b') - z(b) < tj/2) = P(exp(-i(z(&') - zip)) > exp(-7?t/2)) 

= [k/(k + t)] d exp(??t/2) by Markov's inequality 

= (r]k/(2d)) d exp(-7?£;/2 + d) by setting t = 2d/r] - k. 

So if d = (3/4)77/0, then from the union bound 

F(rj/2 < z(b') - 2(6) < rf) > 1 - [exp(-l/3) • 4/3] d - [exp(l/3) • 2/3} d . 

For the PPP, the chance that z(b) - z(b') £ [r//2, 77] for the first 2t/~ 1 (z(/3) - 
.z(O)) intervals to the left of (3 is (again by the union bound) at least 1 — 
2rr 1 (z(/3) - z(0))2[exp(-l/3) • 4/3] d . Making 

ln(0.04(4r/- 1 (z(/3) - z(O)))" 1 ) _ ln^OOry- 1 (z(J3) - z(0)) 
~ -(l/3)+ln(4/3) ~ 1/3 - ln(4/3) 

would make this probability at least 96%. However, q = z(f3) — z(0) is 
unknown. What is known (from step 1 of Algorithm 2.2 is 2(qi + 1/2) has a 
96% chance of being at least q. Since (1/3 — ln(4/3)) _1 = 21.905 . . ., setting 

d = [221n(200r ? - 1 (g + 1/2))] 

and k = (4/3)d/r/ makes the chance that step 2 fails to find a schedule where 
z(b) — z(b') > 1 for any interval at most 4%. 

3.3. Choosing r\. The next question to consider is the size of 77. The 
value of 77 will be used to control the overall relative variance of the product 
estimators W and V. For the ith interval let rrn = f (/% + /3j + i))/2 

be the midpoint of the interval. Let 5i be the difference between the y- 
coordinate of the midpoint of the interval secant line and the function value 
at the midpoint of the interval. That is, 

ftggCfttlhMft) - z{m ). 

From (1.2), V re i(Wj) = exp(2<5j) — 1. Since the relative variance is always 
nonnegative, this implies that 5i > and so the function z is convex. 
From Theorem 1.1, 

(3.1) Y vcl (W) = -1 + JJ(1 + exp(25 4 ) - 1) = -1 + exp (j^ ^ 

So controlling the overall relative variance is a matter of bounding <5j for 
each interval i. The key idea in the bound comes from [13], although they use 
it in a very different fashion. The idea is that when Si is large, the derivative 
of z sharply increases. 
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Lemma 3.2. For the ith interval [fa,fa + i] with z(fa + ±) — z{fa) = rji, 

> exp(4<5j/r?j). 



z'(fa+i) 



z'(fa) 

Proof. Let m« = (fa + fa + {)/2 be the midpoint of interval i, and r/j = 
z(Pi + i) — z((3i) be the change in the z function over the interval. Since z is 
convex, the slope at fa is at most [z(rrii) — z(fa)]/[irii — fa]. On the other 
hand, the slope at fa + \ is at least [z(fa+i) — z(nii)]/[fa + i — rrii]. Since rrii is 
the midpoint of the interval, mi — fa = fa+\ — mi, and 

z'{fa+i) zjfa+i) -z( mi ) _ rg/2 + 5j _ 1 + 25 i /r ]i 
z'(fa) ~ z( mi )-z(fa) TH/2-Si l-25 i / Vi - em * di/rii) - 

□ 

Lemma 3.3. For a cooling schedule over [0,(3] with z(fa + \) — z(fa) < rj 
for all i, 

V rel (W) = Y rel (V) <2e r >l2z>(P)Y>/ 2 . 
In particular, for n > 4 and rj = 2/[2 + ln(2n)], 

Yrel(W) = Y rd (V) < 2e. 

Proof. Recall that V re i(W) < exp(2 ^ i <5j) so the goal is to bound J2 i Si. 
Consider a cooling schedule = fa < fa < ■ ■ ■ < fa = (5. It is well known 
that z'{fa) is just E[-H(X)] where X ~ tt^ : 

Case I: z'{fa) < 1/2. Then H(x) < -1 => > 1 so 

E X :H(x)<-i- H ( x ) e M-PH(x)) i £ a:g(a) <_ ie xp(-ftH-(3;)) 1 
Z(/3) " 2 ^ Z(/3) " 2 

_ E :r :ff( :r )=O eX P(-^( a; )) 1 



Z(/3) " 2 

Z(/5) " 2 

Hence z(/3) - z(0) < ln(2) which means 2S { < ln(2) and exp(^ 25;) < 2. 
Case II: z'(0) > 1/2. Then 2z'(/3) > z'{P)/z'(0), and from the last lemma 

Afa)_Afa) z'(fa) „ 

,'(o) " ^(A) *{Pi-i) -ll e p ^/^- 
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Raising to the rj/2 power then finishes this case. 

Case III: z'(0) < 1/2 < z'(/3). Since z' is continuous, let a £ [0, f}\ be the 
parameter value where E[— H(X)] = 1/2 for X ~ TT a , and suppose a is in 
the jth interval [(3j, (3j+i\- As in Case I, Z(f3j)/Z((3o) < 2. As in Case II, 
Yl^j exp(4<5j) < [2z'(/3)] r? / 2 . Since 25 j < n, that means the combined relative 
variance is at most 2e^ [2z'(/3)] r ?/ 2 . 

Since zf(fi) = E[-H(X)] for X ~ vr^, and X < n, z'(/3) < n. Hence if 
77/2 < l/[2 + ln(2n)], then e*^ (fi)] 71 / 2 < e. □ 

Proof of Theorem 3.1. Using the value of d from section 3.2 and 
Lemma 3.3 gives that the relative variance for an instance of W (or V) 
is at most 2e. All that remains is to analyze the third step of Algorithm 2.2. 
It is easy to verify that if W is the sample average of r independent, iden- 
tically distributed (iid) instances of W, then Y re \(W) = Y re \(W)/r. Let 
e = (1 + e) 1 / 2 - 1. For \2eVWe- 2 ] iid draws of W, Y vcl {W) < e~ 2 /10. 

Chebyshev's Inequality says that for a random variable X with finite 
relative variance: P((l - e)E[X] < X < (1 + e)X) > 1 - V rc i(A)e 2 . Hence 

P((l + e) _1 E[W] < W < (1 + e)E[W]) > 1 - 1/10. 

Similarly, P((l + e^EfU] <V<(1 + e)E[V]) > 1 - 1/10. 

Therefore, the chance that step 1 successfully gives a basic estimate of 
ln(Z(/3)/Z(0)), step 2 creates a well balanced schedule, and step 3 gives W 
and V both within a factor of (1 + e) of their respective means is at least 
1 - 1/100 - 4/100 - 1/10 - 1/10 = 75% by the union bouiKh 

If both W and V are within 1 + e of their means, then W/V is within 
(1 + e) 2 = 1 + e of E[WyE[V"] = Z(fi)/Z(0), completing the proof. □ 

3.4. The running time of the basic algorithm. How many samples does 
Algorithm 2.2 take on average? 

Theorem 3.2. When n > 4, and e < 1/10, Algorithm 2.2 takes on 
average at most 

(q + 1)[5 + (2 + ln(2n))(14.91n(100(2 + ln(2n))(g + 1)) + 48.2e~ 2 )]. 
samples. For fixed e the number of samples is 0(g[ln(n)(ln(g) + m(m(n)))]). 

Proof. A run of TPA uses a number of samples that is one plus a Poisson 
random variable with mean z{(5) —z(0), so on average q + 1 samples. So step 
1 takes 5q + 5 samples on average. From the concavity of the In function and 
Jensen's inequality, the second step takes at most 

f(2/3)(2 + ln(2n)) [221n(100(2 + ln(2n))(g + l))]q 
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samples on average. This is bounded above by 

g[14.9(2 + ln(2n)) ln(100(2 + ln(2n))(g + 1))]. 

The resulting schedule has on average at most q/(d/k) + 1 = (2/3)[2 + 
ln(2n)]g + 1 intervals in it, and so the third step of the algorithm generates 
a number of sample that (on average) is at most 

(2ev/l0)(2/3)(2 + ln(2n))(g + 1)((1 + e) 1/2 - 1)~ 2 . 

When e < 1/10, (1 + e) 1 / 2 — 1 > e/2.05, so the number of samples in this 
section can be bounded by 

48.2(2 + ln(2n))(< ? + l)e~ 2 . 

□ 

4. Variations and extensions. The previous two sections gave an 
algorithm for when —H(x) £ {0, 1,2,..., n}. In this section algorithms are 
given for when H(x) > 0, or for when H{x) changes sign. The section ends 
with a discussion of the use of approximate rather than exact samples. 

4.1. Dealing with a nonnegative Hamiltonian function. When H(x) > 
0, then Z(/3) is a decreasing function of (3. The analysis of the relative 
variance of W and V remains identical, the only thing that changes is the 
TPA algorithm for obtaining the initial estimate of q and the well balanced 
schedule. 

Algorithm 4.1. TPA for Gibbs distributions with H(x) > takes as 
input a value (3 > together with an oracle for generating random samples 
from 7T& for b 6 [0, f3], and returns a set of values < b\ < 62 < ■ ■ ■ < be < b 
such that {z(bi) , . . . , z(bf) forms a Poisson point process of rate 1 on the 
interval [z(0), z(/3)]. It operates as follows 

1. Start with b equal to 0, and B equal to the empty set. 

2. Draw a random sample X from 71-5, and draw U uniformly from [0, 1]. 

3. Let b = b — \n{U) / H{X), unless H{X) = 0, in which case set b = 00. 

4. If b < (3, then add b to the set B, and go back to step 2. 

Everything else about the algorithm remains identical. 
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4.2. Dealing with a Hamiltonian that is both positive and negative.. Sup- 
pose that H(x) G {— n, —n + 1, . . . , —1, 0, 1, ... , n}. Then neither of the 
previous sections apply. However, it is possible to shift the H(x) function 
without changing the distributions TTp. Consider -ff s hift( x ) = H{x) + c. Then 
for X ~ tt/3 with ff s hift(ic): 

exp(-/3F sh if t (x)) exp(-/3c)exp(-/3i7(x)) 



Z s hm((3) -Z s hift(/3) 

so ^ s hift(/3) = Z(f3) exp(— /3c). This means that by making c = — n, it is 
possible to insure that H(x) < 0. But even better, if c = — 2n, it is possible 
to insure that H(x) G {— 2n, —2n + 1, . . . , — n}. 

This means that z'(/3)/V(0) < 2, so the extraneous ln(n) factor from 
Theorem 3.2 is removed. 

Lemma 4.1. For H(x) G {— 2n, —In + 1, . . . , — n}, and rji < n, 
Y rel (W)=Y rel (V)<2^ 2 . 

PROOF. Recall Y rcl (W) = Y rcl (V) = ^exp^/n), and z' (f3 l+1 ) / z' (fo) > 
exp(4:6i/r]i). So 



z'(0) z'Wo) zf{Pi- X ) 



> JJexp(45 i /r ?i ). 



Since H(x) G {-2n,-2n + l,...,-n}, so is E[-H(X)] for X ~ vr or 
X ~ 7r«, so z'(/3)/z'(0) < 2. Raising to the r//2 power finishes the proof. □ 

There is a cost to the shift: Z shift (/3)/Z shift (0) = Z(/3) exp(-/3(-2n))/Z(0), 
so Qshift = Q + 2n/3. Using n = 2/ ln(2) then bounds the relative variance by 
e as before. 

Theorem 4.1. In Algorithm 2.2 applied to the distribution with the 
shifted Hamiltonian, let q\ be the size of the Poisson point process created 
with 5 runs of TPA in step 1. Let d = [221n(200(ln(2))- 1 (gi + 2nf5 + 1))], 
and k = (2/3) ln(2)cL Then the algorithm output is within 1+e of Z(f3)/Z(0) 
with probability at least 3/4- 

Proof. Let n = 2/ln(2) so that Y Te \(W) = V m \(V) is bounded above by 
e, then the rest of the proof is identical to that of Theorem 3.1. □ 

Theorem 4.2. The average number of Gibbs distribution draws used by 
Algorithm 2.2 run with the parameters in Theorem 4-1 is bounded above by 

(q + 2n/3 + 1)[5 + 10.71n(69.4(g + 2n/3 + 1)) + 16.7e~ 2 ]. 
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PROOF. With 77 = 2/ln(2), Lemma 4.1 yields V re i(V) = V re i(W) = e. 
Then the first step of the algorithm uses on average 5(q + 2nf3 + 1) samples, 
the second step uses on average at most 

(fchift + l)r(4/3)(ln(2)/2)r221n(200(ln(2)/2)(g shift + 1))]] 

steps. The final step then takes on average 

g sh i ft rev / 10((l + e) 1/2 - ir 2 l(2/3)ln(2) < 16.7g shift e- 2 

draws from the Gibbs distribution. Adding these together then gives the 
result. □ 



4.3. Dealing with approximate samples. Until now, the algorithms con- 
sidered have been analyzed under the assumptions that samples drawn ex- 
actly from are available. While this is true in many instances (see for 
example [4, 6, 10]) is is not true for all problems. For instance, the only 
algorithm for generating samples from the Ising model that provably runs 
in polynomial time is a Monte Carlo Markov chain (MCMC) approach of 
Jerrum and Sinclair [8] where the samples obtained are arbitrarily close to 
the 7rg distribution. 

This problem of using approximate instead of exact samples can be dealt 
with in a simple way using a technique from [13]. Consider the total varia- 
tion distance between distribution tt and r: TV(7r,r) = (1/2)^ l 7r ({ :r }) ~~ 
t({x})|, It is always possible to create a coupling [3], a bivariate random 
variable (X, Y) such that X ~ vr, Y ~ r and F(X ^ Y) = TV(vr, r). 

By making the value of S large enough that the probability that the 
algorithm requires more than S samples is small, then each approximate 
sample can be coupled with a perfect sample. The probability of failure can 
then be bounded (with the union bound) by the probability that the perfect 
samples fail to approximate the target plus the probability that any of the 
perfect samples are different from the approximate samples. 
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